113 research outputs found
Neuromorphic Acceleration for Approximate Bayesian Inference on Neural Networks via Permanent Dropout
As neural networks have begun performing increasingly critical tasks for
society, ranging from driving cars to identifying candidates for drug
development, the value of their ability to perform uncertainty quantification
(UQ) in their predictions has risen commensurately. Permanent dropout, a
popular method for neural network UQ, involves injecting stochasticity into the
inference phase of the model and creating many predictions for each of the test
data. This shifts the computational and energy burden of deep neural networks
from the training phase to the inference phase. Recent work has demonstrated
near-lossless conversion of classical deep neural networks to their spiking
counterparts. We use these results to demonstrate the feasibility of conducting
the inference phase with permanent dropout on spiking neural networks,
mitigating the technique's computational and energy burden, which is essential
for its use at scale or on edge platforms. We demonstrate the proposed approach
via the Nengo spiking neural simulator on a combination drug therapy dataset
for cancer treatment, where UQ is critical. Our results indicate that the
spiking approximation gives a predictive distribution practically
indistinguishable from that given by the classical network.Comment: 4 pages, 4 figures. Submitted to International Conference on
Neuromorphic Systems (ICONS) 201
A Gradient-Aware Search Algorithm for Constrained Markov Decision Processes
The canonical solution methodology for finite constrained Markov decision
processes (CMDPs), where the objective is to maximize the expected
infinite-horizon discounted rewards subject to the expected infinite-horizon
discounted costs constraints, is based on convex linear programming. In this
brief, we first prove that the optimization objective in the dual linear
program of a finite CMDP is a piece-wise linear convex function (PWLC) with
respect to the Lagrange penalty multipliers. Next, we propose a novel two-level
Gradient-Aware Search (GAS) algorithm which exploits the PWLC structure to find
the optimal state-value function and Lagrange penalty multipliers of a finite
CMDP. The proposed algorithm is applied in two stochastic control problems with
constraints: robot navigation in a grid world and solar-powered unmanned aerial
vehicle (UAV)-based wireless network management. We empirically compare the
convergence performance of the proposed GAS algorithm with binary search (BS),
Lagrangian primal-dual optimization (PDO), and Linear Programming (LP).
Compared with benchmark algorithms, it is shown that the proposed GAS algorithm
converges to the optimal solution faster, does not require hyper-parameter
tuning, and is not sensitive to initialization of the Lagrange penalty
multiplier.Comment: Submitted as a brief paper to the IEEE TNNL
Meta Continual Learning via Dynamic Programming
Meta-continual learning algorithms seek to rapidly train a model when faced
with similar tasks sampled sequentially from a task distribution. Although
impressive strides have been made in this area, there is no theoretical
framework that enables systematic analysis of key learning challenges, such as
generalization and catastrophic forgetting. We introduce a new theoretical
framework for meta-continual learning using dynamic programming, analyze
generalization and catastrophic forgetting, and establish conditions of
optimality. We show that existing meta-continual learning methods can be
derived from the proposed dynamic programming framework. Moreover, we develop a
new dynamic-programming-based meta-continual approach that adopts
stochastic-gradient-driven alternating optimization method. We show that, on
meta-continual learning benchmark data sets, our theoretically grounded
meta-continual learning approach is better than or comparable to the purely
empirical strategies adopted by the existing state-of-the-art methods
Graph Neural Network Architecture Search for Molecular Property Prediction
Predicting the properties of a molecule from its structure is a challenging
task. Recently, deep learning methods have improved the state of the art for
this task because of their ability to learn useful features from the given
data. By treating molecule structure as graphs, where atoms and bonds are
modeled as nodes and edges, graph neural networks (GNNs) have been widely used
to predict molecular properties. However, the design and development of GNNs
for a given data set rely on labor-intensive design and tuning of the network
architectures. Neural architecture search (NAS) is a promising approach to
discover high-performing neural network architectures automatically. To that
end, we develop an NAS approach to automate the design and development of GNNs
for molecular property prediction. Specifically, we focus on automated
development of message-passing neural networks (MPNNs) to predict the molecular
properties of small molecules in quantum mechanics and physical chemistry data
sets from the MoleculeNet benchmark. We demonstrate the superiority of the
automatically discovered MPNNs by comparing them with manually designed GNNs
from the MoleculeNet benchmark. We study the relative importance of the choices
in the MPNN search space, demonstrating that customizing the architecture is
critical to enhancing performance in molecular property prediction and that the
proposed approach can perform customization automatically with minimal manual
effort
MaLTESE: Large-Scale Simulation-Driven Machine Learning for Transient Driving Cycles
Optimal engine operation during a transient driving cycle is the key to
achieving greater fuel economy, engine efficiency, and reduced emissions. In
order to achieve continuously optimal engine operation, engine calibration
methods use a combination of static correlations obtained from dynamometer
tests for steady-state operating points and road and/or track performance data.
As the parameter space of control variables, design variable constraints, and
objective functions increases, the cost and duration for optimal calibration
become prohibitively large. In order to reduce the number of dynamometer tests
required for calibrating modern engines, a large-scale simulation-driven
machine learning approach is presented in this work. A parallel, fast, robust,
physics-based reduced-order engine simulator is used to obtain performance and
emission characteristics of engines over a wide range of control parameters
under various transient driving conditions (drive cycles). We scale the
simulation up to 3,906 nodes of the Theta supercomputer at the Argonne
Leadership Computing Facility to generate data required to train a machine
learning model. The trained model is then used to predict various engine
parameters of interest. Our results show that a deep-neural-network-based
surrogate model achieves high accuracy for various engine parameters such as
exhaust temperature, exhaust pressure, nitric oxide, and engine torque. Once
trained, the deep-neural-network-based surrogate model is fast for inference:
it requires about 16 micro sec for predicting the engine performance and
emissions for a single design configuration compared with about 0.5 s per
configuration with the engine simulator. Moreover, we demonstrate that transfer
learning and retraining can be leveraged to incrementally retrain the surrogate
model to cope with new configurations that fall outside the training data
space
Towards On-Chip Bayesian Neuromorphic Learning
If edge devices are to be deployed to critical applications where their
decisions could have serious financial, political, or public-health
consequences, they will need a way to signal when they are not sure how to
react to their environment. For instance, a lost delivery drone could make its
way back to a distribution center or contact the client if it is confused about
how exactly to make its delivery, rather than taking the action which is "most
likely" correct. This issue is compounded for health care or military
applications. However, the brain-realistic temporal credit assignment problem
neuromorphic computing algorithms have to solve is difficult. The double role
weights play in backpropagation-based-learning, dictating how the network
reacts to both input and feedback, needs to be decoupled. e-prop 1 is a
promising learning algorithm that tackles this with Broadcast Alignment (a
technique where network weights are replaced with random weights during
feedback) and accumulated local information. We investigate under what
conditions the Bayesian loss term can be expressed in a similar fashion,
proposing an algorithm that can be computed with only local information as well
and which is thus no more difficult to implement on hardware. This algorithm is
exhibited on a store-recall problem, which suggests that it can learn good
uncertainty on decisions to be made over time
Reduced-order modeling of advection-dominated systems with recurrent neural networks and convolutional autoencoders
A common strategy for the dimensionality reduction of nonlinear partial
differential equations relies on the use of the proper orthogonal decomposition
(POD) to identify a reduced subspace and the Galerkin projection for evolving
dynamics in this reduced space. However, advection-dominated PDEs are
represented poorly by this methodology since the process of truncation discards
important interactions between higher-order modes during time evolution. In
this study, we demonstrate that an encoding using convolutional autoencoders
(CAEs) followed by a reduced-space time evolution by recurrent neural networks
overcomes this limitation effectively. We demonstrate that a truncated system
of only two latent-space dimensions can reproduce a sharp advecting shock
profile for the viscous Burgers equation with very low viscosities, and a
six-dimensional latent space can recreate the evolution of the inviscid shallow
water equations. Additionally, the proposed framework is extended to a
parametric reduced-order model by directly embedding parametric information
into the latent space to detect trends in system evolution. Our results show
that these advection-dominated systems are more amenable to low-dimensional
encoding and time evolution by a CAE and recurrent neural network combination
than the POD Galerkin technique
Non-autoregressive time-series methods for stable parametric reduced-order models
Advection-dominated dynamical systems, characterized by partial differential
equations, are found in applications ranging from weather forecasting to
engineering design where accuracy and robustness are crucial. There has been
significant interest in the use of techniques borrowed from machine learning to
reduce the computational expense and/or improve the accuracy of predictions for
these systems. These rely on the identification of a basis that reduces the
dimensionality of the problem and the subsequent use of time series and
sequential learning methods to forecast the evolution of the reduced state.
Often, however, machine-learned predictions after reduced-basis projection are
plagued by issues of stability stemming from incomplete capture of multiscale
processes as well as due to error growth for long forecast durations. To
address these issues, we have developed a \emph{non-autoregressive} time series
approach for predicting linear reduced-basis time histories of forward models.
In particular, we demonstrate that non-autoregressive counterparts of
sequential learning methods such as long short-term memory (LSTM) considerably
improve the stability of machine-learned reduced-order models. We evaluate our
approach on the inviscid shallow water equations and show that a
non-autoregressive variant of the standard LSTM approach that is bidirectional
in the PCA components obtains the best accuracy for recreating the nonlinear
dynamics of partial observations. Moreover---and critical for many applications
of these surrogates---inference times are reduced by three orders of magnitude
using our approach, compared with both the equation-based Galerkin projection
method and the standard LSTM approach
Deep-Ensemble-Based Uncertainty Quantification in Spatiotemporal Graph Neural Networks for Traffic Forecasting
Deep-learning-based data-driven forecasting methods have produced impressive
results for traffic forecasting. A major limitation of these methods, however,
is that they provide forecasts without estimates of uncertainty, which are
critical for real-time deployments. We focus on a diffusion convolutional
recurrent neural network (DCRNN), a state-of-the-art method for short-term
traffic forecasting. We develop a scalable deep ensemble approach to quantify
uncertainties for DCRNN. Our approach uses a scalable Bayesian optimization
method to perform hyperparameter optimization, selects a set of high-performing
configurations, fits a generative model to capture the joint distributions of
the hyperparameter configurations, and trains an ensemble of models by sampling
a new set of hyperparameter configurations from the generative model. We
demonstrate the efficacy of the proposed methods by comparing them with other
uncertainty estimation techniques. We show that our generic and scalable
approach outperforms the current state-of-the-art Bayesian and a number of
other commonly used frequentist techniques
Site-specific graph neural network for predicting protonation energy of oxygenate molecules
Bio-oil molecule assessment is essential for the sustainable development of
chemicals and transportation fuels. These oxygenated molecules have adequate
carbon, hydrogen, and oxygen atoms that can be used for developing new
value-added molecules (chemicals or transportation fuels). One motivation for
our study stems from the fact that a liquid phase upgrading using mineral acid
is a cost-effective chemical transformation. In this chemical upgrading
process, adding a proton (positively charged atomic hydrogen) to an oxygen atom
is a central step. The protonation energies of oxygen atoms in a molecule
determine the thermodynamic feasibility of the reaction and likely chemical
reaction pathway. A quantum chemical model based on coupled cluster theory is
used to compute accurate thermochemical properties such as the protonation
energies of oxygen atoms and the feasibility of protonation-based chemical
transformations. However, this method is too computationally expensive to
explore a large space of chemical transformations. We develop a graph neural
network approach for predicting protonation energies of oxygen atoms of
hundreds of bioxygenate molecules to predict the feasibility of aqueous acidic
reactions. Our approach relies on an iterative local nonlinear embedding that
gradually leads to global influence of distant atoms and a output layer that
predicts the protonation energy. Our approach is geared to site-specific
predictions for individual oxygen atoms of a molecule in comparison with
commonly used graph convolutional networks that focus on a singular molecular
property prediction. We demonstrate that our approach is effective in learning
the location and magnitudes of protonation energies of oxygenated molecules
- …